dataset condensation
CondTSF: One-line Plugin of Dataset Condensation for Time Series Forecasting
The objective of dataset condensation is to ensure that the model trained with the synthetic dataset can perform comparably to the model trained with full datasets. However, existing methods predominantly concentrate on classification tasks, posing challenges in their adaptation to time series forecasting (TS-forecasting).
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.92)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > China > Shanghai > Shanghai (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Europe > Austria > Vienna (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- (8 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
SupplementaryMaterialsfor" PrivateSetGeneration withDiscriminativeInformation "
To compute the privacy cost of our approach, we numerically computeDα(M(D) M(D)) in Definition A.1 for a range of ordersα [9, 14] in each training step that requires access to the real gradientgDθ . In comparison to normal non-private training, the major part of the additional memory and computation costisintroduced bytheDP-SGD [1]step(fortheper-sample gradient computation) that sanitizes the parameter gradient on real data, while the other steps (including the update onS, and theupdates ofF(;θ)onS areequivalent tomultiple calls ofthenormal non-privateforward and backward passes (whose costs havelower magnitude than theDP-SGD step). GS-WGAN [3] 5 We adopt the default configuration provided by the official implementation (ε=10): thesubsamplingrate =1/1000,DPnoisescaleσ =1.07,batchsize=32. Following[3], we pretrain (warm-start) the model for2K iterations, and subsequently train for 20K iterations. The experiments presented in Section 5.2 of the main paper correspond to the classincremental learning setting [10]where thedata partition ateach stage contains data from disjoint subsets of label classes.
CondTSF: One-line Plugin of Dataset Condensation for Time Series Forecasting
The objective of dataset condensation is to ensure that the model trained with the synthetic dataset can perform comparably to the model trained with full datasets. However, existing methods predominantly concentrate on classification tasks, posing challenges in their adaptation to time series forecasting (TS-forecasting). This challenge arises from disparities in the evaluation of synthetic data. In classification, the synthetic data is considered well-distilled if the model trained with the full dataset and the model trained with the synthetic dataset yield identical labels for the same input, regardless of variations in output logits distribution. Conversely, in TS-forecasting, the effectiveness of synthetic data distillation is determined by the distance between predictions of the two models. The synthetic data is deemed well-distilled only when all data points within the predictions are similar.
Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale From A New Perspective
We present a new dataset condensation framework termed Squeeze, Recover and Relabel (SRe$^2$L) that decouples the bilevel optimization of model and synthetic data during training, to handle varying scales of datasets, model architectures and image resolutions for efficient dataset condensation. The proposed method demonstrates flexibility across diverse dataset scales and exhibits multiple advantages in terms of arbitrary resolutions of synthesized images, low training cost and memory consumption with high-resolution synthesis, and the ability to scale up to arbitrary evaluation network architectures. Extensive experiments are conducted on Tiny-ImageNet and full ImageNet-1K datasets. Under 50 IPC, our approach achieves the highest 42.5\% and 60.8\% validation accuracy on Tiny-ImageNet and ImageNet-1K, outperforming all previous state-of-the-art methods by margins of 14.5\% and 32.9\%, respectively.
Elucidating the Design Space of Dataset Condensation
Dataset condensation, a concept within $\textit{data-centric learning}$, aims to efficiently transfer critical attributes from an original dataset to a synthetic version, meanwhile maintaining both diversity and realism of syntheses. This approach can significantly improve model training efficiency and is also adaptable for multiple application areas. Previous methods in dataset condensation have faced several challenges: some incur high computational costs which limit scalability to larger datasets ($\textit{e.g.,}$ MTT, DREAM, and TESLA), while others are restricted to less optimal design spaces, which could hinder potential improvements, especially in smaller datasets ($\textit{e.g.,}$ SRe$^2$L, G-VBSM, and RDED). To address these limitations, we propose a comprehensive designing-centric framework that includes specific, effective strategies like implementing soft category-aware matching, adjusting the learning rate schedule and applying small batch-size. These strategies are grounded in both empirical evidence and theoretical backing.